Distinguishing Humans from Robots in Web Search Logs
نویسندگان
چکیده
The workload on web search engines is actually multiclass, being derived from the activities of both human users and automated robots. It is important to distinguish between these two classes in order to reliably characterize human web search behavior, and to study the effect of robot activity. We suggest an approach based on a multidimensional characterization of search sessions, and take first steps towards implementing it. We present a few behavioral criterions, such as the maximal number of queries in a day, query submittal rate, minimal interval of time between different queries, and more. By studying the interaction between the behavioral criterions of the humans vs. the robots, we were able to classify the users in the log files. In order to find the best classification, we used a grading method that helped us to compare the results from several analysis. Our conclusion was that analysis that combines a few criterions may give the best classification between humans and robots.
منابع مشابه
Image flip CAPTCHA
The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...
متن کاملRepresenting a method to identify and contrast with the fraud which is created by robots for developing websites’ traffic ranking
With the expansion of the Internet and the Web, communication and information gathering between individual has distracted from its traditional form and into web sites. The World Wide Web also offers a great opportunity for businesses to improve their relationship with the client and expand their marketplace in online world. Businesses use a criterion called traffic ranking to determine their si...
متن کاملمقایسه وبلاگ های کتابخانه ها و کتابداران ایرانی با وبلاگ های برتر کتابداری؛1385
Introduction: Web logs are the evident tools for the librarians. There are three main ways for applying web logs in librarianship fields, as follows: personal use by librarian to upgrade their personal information, as a source of information in case of libraries, and for their services. The aim of this research is to comparison between Iranian libraries and librarians, and superior librarianshi...
متن کاملDistinguishing Humans from Bots in Web Search Logs
Cleaning workload data and separating it into classes is a necessary pre-requisite for workload characterization. In particular, the workload on web search engines is derived from the activities of both human users and automated bots. It is important to distinguish between these two classes in order to reliably characterize human web search behavior, and to study the effects of bot activity. Ho...
متن کاملModeling of Web Robot Navigational Patterns
In recent years, it is becoming increasingly diÆcult to ignore the impact of Web robots on both commercial and institutional Web sites. Not only do Web robots consume valuable bandwidth and Web server resources, they are also making it more diÆcult to apply Web Mining techniques e ectively on the Web logs. E-commerce Web sites are also concern about unauthorized deployment of shopbots for the p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009